Copyright (C) 1994, 1995 Board of Trustees, University of Illinois.
Permission is granted to make and distribute verbatim copies of this
manual provided the copyright notice and this permission notice are
preserved on all copies.
Permission is granted to copy and distribute modified versions of
this manual under the conditions for verbatim copying, provided that the
entire resulting derived work is distributed under the terms of a
permission notice identical to this one.
Permission is granted to copy and distribute translations of this
manual into another language, under the above conditions for modified
versions, except that this permission notice may be stated in a
translation approved by the Foundation.
Permission is granted to copy and distribute modified versions of
this manual under the conditions for verbatim copying, provided also
that the section entitled "GNU General Public License" is included
exactly as in the original, and provided that the entire resulting
derived work is distributed under the terms of a permission notice
identical to this one.
Permission is granted to copy and distribute translations of this
manual into another language, under the above conditions for modified
versions, except that the section entitled "GNU General Public License"
may be included in a translation approved by the Free Software
Foundation instead of in the original English.
File: internals.info, Node: The XEmacs Object System (Abstractly Speaking), Next: How Lisp Objects Are Represented in C, Prev: XEmacs From the Inside, Up: Top
The XEmacs Object System (Abstractly Speaking)
**********************************************
At the heart of the Lisp interpreter is its management of objects.
XEmacs Lisp contains many built-in objects, some of which are simple
and others of which can be very complex; and some of which are very
common, and others of which are rarely used or are only used
internally. (Since the Lisp allocation system, with its automatic
reclamation of unused storage, is so much more convenient than
`malloc()' and `free()', the C code makes extensive use of it in its
internal operations.)
The basic Lisp objects are
`integer'
28 bits of precision, or 60 bits on 64-bit machines; the reason
for this is described below when the internal Lisp object
representation is described.
`float'
Same precision as a double in C.
`cons'
A simple container for two Lisp objects, used to implement lists
and most other data structures in Lisp.
`char'
An object representing a single character of text; chars behave
like integers in many ways but are logically considered text
rather than numbers and have a different read syntax. (the read
syntax for a char contains the char itself or some textual
encoding of it - for example, a Japanese Kanji character might be
encoded as `^[$(B#&^[(B' using the ISO-2022 encoding standard -
rather than the numerical representation of the char; this way, if
the mapping between chars and integers changes, which is quite
possible for Kanji characters and other extended characters, the
same character will still be created. Note that some primitives
confuse chars and integers. The worst culprit is `eq', which
makes a special exception and considers a char to be `eq' to its
integer equivalent, even though in no other case are objects of two
different types `eq'. The reason for this monstrosity is
compatibility with existing code; the separation of char from
integer came fairly recently.)
`symbol'
An object that contains Lisp objects and is referred to by name;
symbols are used to implement variables and named functions and to
provide the equivalent of preprocessor constants in C.
`vector'
A one-dimensional array of Lisp objects providing constant-time
access to any of the objects; access to an arbitrary object in a
vector is faster than for lists, but the operations that can be
done on a vector are more limited.
`string'
Self-explanatory; behaves much like a vector of chars but has a
different read syntax and is stored and manipulated more compactly
and efficiently.
`bit-vector'
A vector of bits; similar to a string in spirit.
`compiled-function'
An object describing compiled Lisp code, known as "byte code".
`subr'
An object describing a Lisp primitive.
Note that there is no basic "function" type, as in more powerful
versions of Lisp (where it's called a "closure"). XEmacs Lisp does not
provide the closure semantics implemented by Common Lisp and Scheme.
The guts of a function in XEmacs Lisp are represented in one of four
ways: a symbol specifying another function (when one function is an
alias for another), a list containing the function's source code, a
bytecode object, or a subr object. (In other words, given a symbol
specifying the name of a function, calling `symbol-function' to
retrieve the contents of the symbol's function cell will return one of
these types of objects.)
XEmacs Lisp also contains numerous specialized objects used to
implement the editor:
`buffer'
Stores text like a string, but is optimized for insertion and
deletion and has certain other properties that can be set.
`frame'
An object with various properties whose displayable representation
is a "window" in window-system parlance.
`window'
A section of a frame that displays the contents of a buffer; often
called a "pane" in window-system parlance.
`window-configuration'
An object that represents a saved configuration of windows in a
frame.
`device'
An object representing a screen on which frames can be displayed;
equivalent to a "display" in the X Window System and a "TTY" in
character mode.
`face'
An object specifying the appearance of text or graphics; it
contains characteristics such as font, foreground color, and
background color.
`marker'
An object that refers to a particular position in a buffer and
moves around as text is inserted and deleted to stay in the same
relative position to the text around it.
`extent'
Similar to a marker but covers a range of text in a buffer; can
also specify properties of the text, such as a face in which the
text is to be displayed, whether the text is invisible or
unmodifiable, etc.
`event'
Generated by calling `next-event' and contains information
describing a particular event happening in the system, such as the
user pressing a key or a process terminating.
`keymap'
An object that maps from events (described using lists, vectors,
and symbols rather than with an event object because the mapping
is for classes of events, rather than individual events) to
functions to execute or other events to recursively look up; the
functions are described by name, using a symbol, or using lists to
specify the function's code.
`glyph'
An object that describes the appearance of an image (e.g. pixmap)
on the screen; glyphs can be attached to the beginning or end of
extents and in some future version of XEmacs will be able to be
inserted directly into a buffer.
`process'
An object that describes a connection to an externally-running
process.
There are some other, less-commonly-encountered general objects:
`hashtable'
An object that maps from an arbitrary Lisp object to another
arbitrary Lisp object, using hashing for fast lookup.
`obarray'
A limited form of hashtable that maps from strings to symbols;
obarrays are used to look up a symbol given its name and are not
actually their own object type but are kludgily represented using
vectors with hidden fields (this representation derives from GNU
Emacs).
`specifier'
A complex object used to specify the value of a display property; a
default value is given and different values can be specified for
particular frames, buffers, windows, devices, or classes of device.
`char-table'
An object that maps from chars or classes of chars to arbitrary
Lisp objects; internally char tables use a complex nested-vector
representation that is optimized to the way characters are
represented as integers.
`range-table'
An object that maps from ranges of integers to arbitrary Lisp
objects.
And some strange special-purpose objects:
`charset'
`coding-system'
Objects used when MULE, or multi-lingual/Asian-language, support is
enabled.
`color-instance'
`font-instance'
`image-instance'
An object that encapsulates a window-system resource; instances are
mostly used internally but are exposed on the Lisp level for
cleanness of the specifier model and because it's occasionally
useful for Lisp program to create or query the properties of
instances.
`subwindow'
An object that encapsulate a "subwindow" resource, i.e. a
window-system child window that is drawn into by an external
process; this object should be integrated into the glyph system
but isn't yet, and may change form when this is done.
`tooltalk-message'
`tooltalk-pattern'
Objects that represent resources used in the ToolTalk interprocess
communication protocol.
`toolbar-button'
An object used in conjunction with the toolbar.
`x-resource'
An object that encapsulates certain miscellaneous resources in the
X window system, used only when Epoch support is enabled.
And objects that are only used internally:
opaque
A generic object for encapsulating arbitrary memory; this allows
you the generality of `malloc()' and the convenience of the Lisp
object system.
lstream
A buffering I/O stream, used to provide a unified interface to
anything that can accept output or provide input, such as a file
descriptor, a stdio stream, a chunk of memory, a Lisp buffer, a
Lisp string, etc.; it's a Lisp object to make its memory
management more convenient.
char-table-entry
Subsidiary objects in the internal char-table representation.
extent-auxiliary
menubar-data
toolbar-data
Various special-purpose objects that are basically just used to
encapsulate memory for particular subsystems, similar to the more
general "opaque" object.
symbol-value-forward
symbol-value-buffer-local
symbol-value-varalias
symbol-value-lisp-magic
Special internal-only objects that are placed in the value cell of
a symbol to indicate that there is something special with this
variable - e.g. it has no value, it mirrors another variable, or
it mirrors some C variable; there is really only one kind of
object, called a "symbol-value-magic", but it is sort-of halfway
kludged into semi-different object types.
Some types of objects are "permanent", meaning that once created,
they do not disappear until explicitly destroyed, using a function such
as `delete-buffer', `delete-window', `delete-frame', etc. Others will
disappear once they are not longer used, through the garbage collection
mechanism. Buffers, frames, windows, devices, and processes are among
the objects that are permanent. Note that some objects can go both
ways: Faces can be created either way; extents are normally permanent,
but detached extents (extents not referring to any text, as happens to
some extents when the text they are referring to is deleted) are
temporary. Note that some permanent objects, such as faces and coding
systems, cannot be deleted. Note also that windows are unique in that
they can be *undeleted* after having previously been deleted. (This
happens as a result of restoring a window configuration.)
Note that many types of objects have a "read syntax", i.e. a way of
specifying an object of that type in Lisp code. When you load a Lisp
file, or type in code to be evaluated, what really happens is that the
function `read' is called, which reads some text and creates an object
based on the syntax of that text; then `eval' is called, which possibly
does something special; then this loop repeats until there's no more
text to read. (`eval' only actually does something special with
symbols, which causes the symbol's value to be returned, similar to
referencing a variable; and with conses [i.e. lists], which cause a
function invocation. All other values are returned unchanged.)
The read syntax
17297
converts to an integer whose value is 17297.
1.983e-4
converts to a float whose value is 1983.23e-4, or .0001983.
?b
converts to a char that represents the lowercase letter b.
?^[$(B#&^[(B
(where `^[' actually is an `ESC' character) converts to a particular
Kanji character when using an ISO2022-based coding system for input.
(To decode this gook: `ESC' begins an escape sequence; `ESC $ (' is a
class of escape sequences meaning "switch to a 94x94 character set";
`ESC $ ( B' means "switch to Japanese Kanji"; `#' and `&' collectively
index into a 94-by-94 array of characters [subtract 33 from the ASCII
value of each character to get the corresponding index]; `ESC (' is a
class of escape sequences meaning "switch to a 94 character set"; `ESC
(B' means "switch to US ASCII". It is a coincidence that the letter
`B' is used to denote both Japanese Kanji and US ASCII. If the first
`B' were replaced with an `A', you'd be requesting a Chinese Hanzi
character from the GB2312 character set.)
"foobar"
converts to a string.
foobar
converts to a symbol whose name is `"foobar"'. This is done by
looking up the string equivalent in the global variable `obarray',
whose contents should be an obarray. If no symbol is found, a new
symbol with the name `"foobar"' is automatically created and adding it
to `obarray'; this process is called "interning" the symbol.
(foo . bar)
converts to a cons cell containing the symbols `foo' and `bar'.
(1 a 2.5)
converts to a three-element list containing the specified objects
(note that a list is actually a set of nested conses; see the XEmacs
Lisp Reference).
[1 a 2.5]
converts to a three-element vector containing the specified objects.
#[... ... ... ...]
converts to a compiled-function object (the actual contents are not
shown since they are not relevant here; look at a file that ends with
`.elc' for examples).
#*01110110
converts to a bit-vector.
#s(range-table ... ...)
converts to a range table (the actual contents are not shown).
#s(char-table ... ...)
converts to a char table (the actual contents are not shown). (Note
that the #s syntax is the general syntax for structures, which are not
really implemented in XEmacs Lisp but should be.)
When an object is printed out (using `print' or a related function),
the read syntax is used, so that the same object can be read in again.
The other objects do not have read syntaxes, usually because it does
not really make sense to create them in this fashion (i.e. processes,
where it doesn't make sense to have a subprocess created as a side
effect of reading some Lisp code), or because they can't be created at
all (e.g. subrs). Permanent objects, as a rule, do not have a read
syntax; nor do most complex objects, which contain too much state to be
easily initialized through a read syntax.
File: internals.info, Node: How Lisp Objects Are Represented in C, Next: Rules When Writing New C Code, Prev: The XEmacs Object System (Abstractly Speaking), Up: Top
How Lisp Objects Are Represented in C
*************************************
Lisp objects are represented in C using a 32- or 64-bit machine word
(depending on the processor; i.e. DEC Alphas use 64-bit Lisp objects and
most other processors use 32-bit Lisp objects). The representation
The tag describes the type of the Lisp object. For integers and
chars, the lower 28 bits contain the value of the integer or char; for
all others, the lower 28 bits contain a pointer. The mark bit is used
during garbage-collection, and is always 0 when garbage collection is
not happening. Many macros that extract out parts of a Lisp object
expect that the mark bit is 0, and will produce incorrect results if
it's not. (The way that garbage collection works, basically, is that it
loops over all places where Lisp objects could exist - this includes
all global variables in C that contain Lisp objects [including
`Vobarray', the C equivalent of `obarray'; through this, all Lisp
variables will get marked], plus various other places - and recursively
scans through the Lisp objects, marking each object it finds by setting
the mark bit. Then it goes through the lists of all objects allocated,
freeing the ones that are not marked and turning off the mark bit of
the ones that are marked.)
Lisp objects use the typedef `Lisp_Object', but the actual C type
used for the Lisp object can vary. It can be either a simple type
(`long' on the DEC Alpha, `int' on other machines) or a structure whose
fields are bit fields that line up properly (actually, it's a union of
structures that's used). Generally the simple integral type is
preferable because it ensures that the compiler will actually use a
machine word to represent the object (some compilers will use more
general and less efficient code for unions and structs even if they can
fit in a machine word). The union type, however, has the advantage of
stricter type checking (if you accidentally pass an integer where a Lisp
object is desired, you get a compile error), and it makes it easier to
decode Lisp objects when debugging. The choice of which type to use is
determined by the presence or absence of the preprocessor constant
`NO_UNION_TYPE'. (Shouldn't it be `USE_UNION_TYPE', with opposite
semantics? "Hysterical reasons", of course.)
Note that there are only eight types that the tag can represent, but
many more actual types than this. This is handled by having one of the
tag types specify a meta-type called a "record"; for all such objects,
the first four bytes of the pointed-to structure indicate what the
actual type is.
Note also that having 28 bits for pointers and integers restricts a
lot of things to 256 megabytes of memory. (Basically, enough pointers
and indices and whatnot get stuffed into Lisp objects that the total
amount of memory used by XEmacs can't grow above 256 megabytes. In
older versions of XEmacs and GNU Emacs, the tag was 5 bits wide,
allowing for 32 types, which was more than the actual number of types
that existed at the time, and no "record" type was necessary. However,
this limited the editor to 64 megabytes total, which some users who
edited large files might conceivably exceed.)
Also, note that there is an implicit assumption here that all
pointers are low enough that the top bits are all zero and can just be
chopped off. On standard machines that allocate memory from the bottom
up (and give each process its own address space), this works fine. Some
machines, however, put the data space somewhere else in memory (e.g.
beginning at 0x80000000). Those machines cope by defining
`DATA_SEG_BITS' in the corresponding `m/' or `s/' file to the proper
mask. Then, pointers retrieved from Lisp objects are automatically
OR'ed with this value prior to being used.
A corollary of the previous paragraph is that *(pointers to)
stack-allocated structures cannot be put into Lisp objects*. The stack
is generally located near the top of memory; if you put such a pointer
into a Lisp object, it will get its top bits chopped off, and you will
lose.
Various macros are used to construct Lisp objects and extract the
components. Macros of the form `XINT()', `XCHAR()', `XSTRING()',
`XSYMBOL()', etc. mask out the pointer/integer field and cast it to the
appropriate type. All of the macros that construct pointers will `OR'
with `DATA_SEG_BITS' if necessary. `XINT()' needs to be a bit tricky
so that negative numbers are properly sign-extended: Usually it does
this by shifting the number four bits to the left and then four bits to
the right. This assumes that the right-shift operator does an
arithmetic shift (i.e. it leaves the most-significant bit as-is rather
than shifting in a zero, so that it mimics a divide-by-two even for
negative numbers). Not all machines/compilers do this, and on the ones
that don't, a more complicated definition is selected by defining
`EXPLICIT_SIGN_EXTEND'.
Note that when `ERROR_CHECK_TYPECHECK' is defined, the extractor
macros become more complicated - they check the tag bits and/or the
type field in the first four bytes of a record type to ensure that the
object is really of the correct type. This is great for catching places
where an incorrect type is being dereferenced - this typically results
in a pointer being dereferenced as the wrong type of structure, with
unpredictable (and sometimes not easily traceable) results.
There are similar `XSETTYPE()' macros that construct a Lisp object.
These macros are of the form `XSETTYPE (LVALUE, RESULT)', i.e. they
have to be a statement rather than just used in an expression. The
reason for this is that standard C doesn't let you "construct" a
structure (but GCC does). Granted, this sometimes isn't too convenient;
for the case of integers, at least, you can use the function
`make_number()', which constructs and *returns* an integer Lisp object.
Note that the `XSETTYPE()' macros are also affected by
`ERROR_CHECK_TYPECHECK' and make sure that the structure is of the
right type in the case of record types, where the type is contained in
the structure.
File: internals.info, Node: Rules When Writing New C Code, Next: A Summary of the Various XEmacs Modules, Prev: How Lisp Objects Are Represented in C, Up: Top
Rules When Writing New C Code
*****************************
The XEmacs C Code is extremely complex and intricate, and there are
many rules that are more or less consistently followed throughout the
code. Many of these rules are not obvious, so they are explained here.
It is of the utmost importance that you follow them. If you don't,
you may get something that appears to work, but which will crash in odd
situations, often in code far away from where the actual breakage is.
* Menu:
* General Coding Rules::
* Writing Lisp Primitives::
* Adding Global Lisp Variables::
* Techniques for XEmacs Developers::
File: internals.info, Node: General Coding Rules, Next: Writing Lisp Primitives, Up: Rules When Writing New C Code
General Coding Rules
====================
Almost every module contains a `syms_of_*()' function and a
`vars_of_*()' function. The former declares any Lisp primitives you
have defined and defines any symbols you will be using. The latter
declares any global Lisp variables you have added and initializes global
C variables in the module. For each such function, declare it in
`symsinit.h' and make sure it's called in the appropriate place in
`emacs.c'. *Important*: There are stringent requirements on exactly
what can go into these functions. See the comment in `emacs.c'. The
reason for this is to avoid obscure unwanted interactions during
initialization. If you don't follow these rules, you'll be sorry! If
you want to do anything that isn't allowed, create a
`complex_vars_of_*()' function for it. Doing this is tricky, though:
You have to make sure your function is called at the right time so that
all the initialization dependencies work out.
Every module includes `<config.h>' (angle brackets so that
`--srcdir' works correctly; `config.h' may or may not be in the same
directory as the C sources) and `lisp.h'. `config.h' should always be
included before any other header files (including system header files)
to ensure that certain tricks played by various `s/' and `m/' files
work out correctly.
*All global and static variables that are to be modifiable must be
declared uninitialized.* This means that you may not use the "declare
with initializer" form for these variables, such as `int some_variable
= 0;'. The reason for this has to do with some kludges done during the
dumping process: If possible, the initialized data segment is re-mapped
so that it becomes part of the (unmodifiable) code segment in the
dumped executable. This allows this memory to be shared among multiple
running XEmacs processes. XEmacs is careful to place as much constant
data as possible into initialized variables (in particular, into what's
called the "pure space" - see below) during the `temacs' phase.
*Note:* This kludge only works on a few systems nowadays, and is
rapidly becoming irrelevant because most modern operating systems
provide "copy-on-write" semantics. All data is initially shared between
processes, and a private copy is automatically made (on a page-by-page
basis) when a process first attempts to write to a page of memory.
Formerly, there was a requirement that static variables not be
declared inside of functions. This had to do with another hack along
the same vein as what was just described: old USG systems put
statically-declared variables in the initialized data space, so those
header files had a `#define static' declaration. (That way, the
data-segment remapping described above could still work.) This fails
badly on static variables inside of functions, which suddenly become
automatic variables; therefore, you weren't supposed to have any of
them. This awful kludge has been removed in XEmacs because
1. almost all of the systems that used this kludge ended up having to
disable the data-segment remapping anyway;
2. the only systems that didn't were extremely outdated ones;
3. this hack completely messed up inline functions.
File: internals.info, Node: Writing Lisp Primitives, Next: Adding Global Lisp Variables, Prev: General Coding Rules, Up: Rules When Writing New C Code
Writing Lisp Primitives
=======================
Lisp primitives are Lisp functions implemented in C. The details of
interfacing the C function so that Lisp can call it are handled by a few
C macros. The only way to really understand how to write new C code is
to read the source, but we can explain some things here.
An example of a special form is the definition of `or', from
`eval.c'. (An ordinary function would have the same general
appearance.)
DEFUN ("or", For, 0, UNEVALLED, 0, /*
Eval args until one of them yields non-nil, then return that value.
The remaining args are not evalled at all.
If all args return nil, return nil.
*/
(args))
{
/* This function can GC */
REGISTER Lisp_Object val;
Lisp_Object args_left;
struct gcpro gcpro1;
if (NILP (args))
return Qnil;
args_left = args;
GCPRO1 (args_left);
do
{
val = Feval (Fcar (args_left));
if (!NILP (val))
break;
args_left = Fcdr (args_left);
}
while (!NILP (args_left));
UNGCPRO;
return val;
}
Let's start with a precise explanation of the arguments to the
`DEFUN' macro. Here is a template for them:
DEFUN (LNAME, FNAME, MIN, MAX, INTERACTIVE, /*
DOCSTRING
*/
(ARGLIST) )
LNAME
This string is the name of the Lisp symbol to define as the
function name; in the example above, it is `"or"'.
FNAME
This is the C function name for this function. This is the name
that is used in C code for calling the function. The name is, by
convention, `F' prepended to the Lisp name, with all dashes (`-')
in the Lisp name changed to underscores. Thus, to call this
function from C code, call `For'. Remember that the arguments are
of type `Lisp_Object'; various macros and functions for creating
values of type `Lisp_Object' are declared in the file `lisp.h'.
Primitives whose names are special characters (e.g. `+' or `<')
are named by spelling out, in some fashion, the special character:
e.g. `Fplus()' or `Flss()'. Primitives whose names begin with
normal alphanumeric characters but also contain special characters
are spelled out in some creative way, e.g. `let*' becomes
`FletX()'.
Each function also has an associated structure that holds the data
for the subr object that represents the function in Lisp. This
structure conveys the Lisp symbol name to the initialization
routine that will create the symbol and store the subr object as
its definition. The C variable name of this structure is always
`S' prepended to the FNAME. You hardly ever need to be aware of
the existence of this structure.
MIN
This is the minimum number of arguments that the function
requires. The function `or' allows a minimum of zero arguments.
MAX
This is the maximum number of arguments that the function accepts,
if there is a fixed maximum. Alternatively, it can be `UNEVALLED',
indicating a special form that receives unevaluated arguments, or
`MANY', indicating an unlimited number of evaluated arguments (the
equivalent of `&rest'). Both `UNEVALLED' and `MANY' are macros.
If MAX is a number, it may not be less than MIN and it may not be
greater than 8. (If you need to add a function with more than 8
arguments, either use the `MANY' form or edit the definition of
`DEFUN' in `lisp.h'. If you do the latter, make sure to also add
another clause to the switch statement in `primitive_funcall().')
INTERACTIVE
This is an interactive specification, a string such as might be
used as the argument of `interactive' in a Lisp function. In the
case of `or', it is 0 (a null pointer), indicating that `or'
cannot be called interactively. A value of `""' indicates a
function that should receive no arguments when called
interactively.
DOCSTRING
This is the documentation string. It is written just like a
documentation string for a function defined in Lisp; in
particular, the first line should be a single sentence. Note how
the documentation string is enclosed in a comment, none of the
documentation is placed on the same lines as the comment-start and
comment-end characters, and the comment-start characters are on
the same line as the interactive specification. `make-docfile',
which scans the C files for documentation strings, is very
particular about what it looks for, and will not properly extract
the doc string if it's not in this exact format.
You are free to put the various arguments to `DEFUN' on separate
lines to avoid overly long lines. However, make sure to put the
comment-start characters for the doc string on the same line as the
interactive specification, and put a newline directly after them
(and before the comment-end characters).
ARGLIST
This is the comma-separated list of arguments to the C function.
For a function with a fixed maximum number of arguments, provide a
C argument for each Lisp argument. In this case, unlike regular C
functions, the types of the arguments are not declared; they are
simply always of type `Lisp_Object'.
The names of the C arguments will be used as the names of the
arguments to the Lisp primitive as displayed in its documentation,
modulo the same concerns described above for `F...' names (in
particular, underscores in the C arguments become dashes in the
Lisp arguments).
There is one additional kludge: A trailing `_' on the C argument is
discarded when forming the Lisp argument. This allows C language
reserved words (like `default') or global symbols (like `dirname')
to be used as argument names without compiler warnings or errors.
A Lisp function with MAX = `UNEVALLED' is a "special form"; its
arguments are not evaluated. Instead it receives one argument of
type `Lisp_Object', a (Lisp) list of the unevaluated arguments,
conventionally named `(args)'.
When a Lisp function has no upper limit on the number of arguments,
specify MAX = `MANY'. In this case its implementation in C
actually receives exactly two arguments: the number of Lisp
arguments (an `int') and the address of a block containing their
values (a `Lisp_Object *'). In this case only are the C types
specified in the ARGLIST: `(int nargs, Lisp_Object *args)'.
Within the function `For' itself, note the use of the macros
`GCPRO1' and `UNGCPRO'. `GCPRO1' is used to "protect" a variable from
garbage collection--to inform the garbage collector that it must look
in that variable and regard its contents as an accessible object. This
is necessary whenever you call `Feval' or anything that can directly or
indirectly call `Feval' (this includes the `QUIT' macro!). At such a
time, any Lisp object that you intend to refer to again must be
protected somehow. `UNGCPRO' cancels the protection of the variables
that are protected in the current function. It is necessary to do this
explicitly.
The macro `GCPRO1' protects just one local variable. If you want to
protect two, use `GCPRO2' instead; repeating `GCPRO1' will not work.
Macros `GCPRO3' and `GCPRO4' also exist.
These macros implicitly use local variables such as `gcpro1'; you
must declare these explicitly, with type `struct gcpro'. Thus, if you
use `GCPRO2', you must declare `gcpro1' and `gcpro2'.
Note also that the general rule is "caller-protects"; i.e. you are
only responsible for protecting those Lisp objects that you create.
Any objects passed to you as parameters should have been protected by
whoever created them, so you don't in general have to protect them.
`For' is an exception; it protects its parameters to provide extra
assurance against Lisp primitives elsewhere that are incorrectly
written, and against malicious self-modifying code. There are a few
other standard functions that also do this.
`GCPRO'ing is perhaps the trickiest and most error-prone part of
XEmacs coding. It is *extremely* important that you get this right and
use a great deal of discipline when writing this code. *Note
`GCPRO'ing: GCPROing, for full details on how to do this.
What `DEFUN' actually does is declare a global structure of type
`Lisp_Subr' whose name begins with capital `SF' and which contains
information about the primitive (e.g. a pointer to the function, its
minimum and maximum allowed arguments, a string describing its Lisp
name); `DEFUN' then begins a normal C function declaration using the
`F...' name. The Lisp subr object that is the function definition of a
primitive (i.e. the object in the function slot of the symbol that
names the primitive) actually points to this `SF' structure; when
`Feval' encounters a subr, it looks in the structure to find out how to
call the C function.
Defining the C function is not enough to make a Lisp primitive
available; you must also create the Lisp symbol for the primitive (the
symbol is "interned"; *note Obarrays::.) and store a suitable subr
object in its function cell. (If you don't do this, the primitive won't
be seen by Lisp code.) The code looks like this:
DEFSUBR (FNAME);
Here FNAME is the name you used as the second argument to `DEFUN'.
This call to `DEFSUBR' should go in the `syms_of_*()' function at
the end of the module. If no such function exists, create it and make
sure to also declare it in `symsinit.h' and call it from the
appropriate spot in `main()'. *Note General Coding Rules::.
Note that C code cannot call functions by name unless they are
defined in C. The way to call a function written in Lisp from C is to
use `Ffuncall', which embodies the Lisp function `funcall'. Since the
Lisp function `funcall' accepts an unlimited number of arguments, in C
it takes two: the number of Lisp-level arguments, and a one-dimensional
array containing their values. The first Lisp-level argument is the
Lisp function to call, and the rest are the arguments to pass to it.
Since `Ffuncall' can call the evaluator, you must protect pointers from
garbage collection around the call to `Ffuncall'. (However, `Ffuncall'
explicitly protects all of its parameters, so you don't have to protect
any pointers passed as parameters to it.)
The C functions `call0', `call1', `call2', and so on, provide handy
ways to call a Lisp function conveniently with a fixed number of
arguments. They work by calling `Ffuncall'.
`eval.c' is a very good file to look through for examples; `lisp.h'
contains the definitions for some important macros and functions.
File: internals.info, Node: Adding Global Lisp Variables, Next: Techniques for XEmacs Developers, Prev: Writing Lisp Primitives, Up: Rules When Writing New C Code
Adding Global Lisp Variables
============================
Global variables whose names begin with `Q' are constants whose
value is a symbol of a particular name. The name of the variable should
be derived from the name of the symbol using the same rules as for Lisp
primitives. These variables are initialized using a call to
`defsymbol()' in the `syms_of_*()' function. (This call interns a
symbol, sets the C variable to the resulting Lisp object, and calls
`staticpro()' on the C variable to tell the garbage-collection
mechanism about this variable. What `staticpro()' does is add a
pointer to the variable to a large global array; when
garbage-collection happens, all pointers listed in the array are used
as starting points for marking Lisp objects. This is important because
it's quite possible that the only current reference to the object is
the C variable. In the case of symbols, the `staticpro()' doesn't
matter all that much because the symbol is contained in `obarray',
which is itself `staticpro()'ed. However, it's possible that a naughty
user could do something like uninterning the symbol out of `obarray' or
even setting `obarray' to a different value [although this is likely to
make XEmacs crash!].)
*Note:* It is potentially deadly if you declare a `Q...' variable
in two different modules. The two calls to `defsymbol()' are no
problem, but some linkers will complain about multiply-defined symbols.
The most insidious aspect of this is that often the link will succeed
anyway, but then the resulting executable will sometimes crash in
obscure ways during certain operations! To avoid this problem, declare
any symbols with common names (such as `text') that are not obviously
associated with this particular module in the module `general.c'.
Global variables whose names begin with `V' are variables that
contain Lisp objects. The convention here is that all global variables
of type `Lisp_Object' begin with `V', and all others don't (including
integer and boolean variables that have Lisp equivalents). Most of the
time, these variables have equivalents in Lisp, but some don't. Those
that do are declared this way by a call to `DEFVAR_LISP()' in the
`vars_of_*()' initializer for the module. What this does is create a
special "symbol-value-forward" Lisp object that contains a pointer to
the C variable, intern a symbol whose name is as specified in the call
to `DEFVAR_LISP()', and set its value to the symbol-value-forward Lisp
object; it also calls `staticpro()' on the C variable to tell the
garbage-collection mechanism about the variable. When `eval' (or
actually `symbol-value') encounters this special object in the process
of retrieving a variable's value, it follows the indirection to the C
variable and gets its value. `setq' does similar things so that the C
variable gets changed.
Whether or not you `DEFVAR_LISP()' a variable, you need to
initialize it in the `vars_of_*()' function; otherwise it will end up
as all zeroes, which is the integer 0 (*not* `nil'), and this is
probably not what you want. Also, if the variable is not
`DEFVAR_LISP()'ed, *you must call* `staticpro()' on the C variable in
the `vars_of_*()' function. Otherwise, the garbage-collection
mechanism won't know that the object in this variable is in use, and
will happily collect it and reuse its storage for another Lisp object,
and you will be the one who's unhappy when you can't figure out how
your variable got overwritten.
File: internals.info, Node: Techniques for XEmacs Developers, Prev: Adding Global Lisp Variables, Up: Rules When Writing New C Code
Techniques for XEmacs Developers
================================
To make a quantified XEmacs, do: `make quantmacs'.
You simply can't dump Quantified and Purified images. Run the image
like so: `quantmacs -batch -l loadup.el run-temacs -q'.
Before you go through the trouble, are you compiling with all
debugging and error-checking off? If not try that first. Be warned
that while Quantify is directly responsible for quite a few
optimizations which have been made to XEmacs, doing a run which
generates results which can be acted upon is not necessarily a trivial
task.
Also, if you're still willing to do some runs make sure you configure
with the `--quantify' flag. That will keep Quantify from starting to
record data until after the loadup is completed and will shut off
recording right before it shuts down (which generates enough bogus data
to throw most results off). It also enables three additional elisp
commands: `quantify-start-recording-data',
`quantify-stop-recording-data' and `quantify-clear-data'.
To get started debugging XEmacs, take a look at the `gdbinit' and
`dbxrc' files in the `src' directory.
File: internals.info, Node: A Summary of the Various XEmacs Modules, Next: Allocation of Objects in XEmacs Lisp, Prev: Rules When Writing New C Code, Up: Top
A Summary of the Various XEmacs Modules
***************************************
This is accurate as of XEmacs 20.0.
* Menu:
* Low-Level Modules::
* Basic Lisp Modules::
* Modules for Standard Editing Operations::
* Editor-Level Control Flow Modules::
* Modules for the Basic Displayable Lisp Objects::
* Modules for other Display-Related Lisp Objects::
* Modules for the Redisplay Mechanism::
* Modules for Interfacing with the File System::
* Modules for Other Aspects of the Lisp Interpreter and Object System::
* Modules for Interfacing with the Operating System::
* Modules for Interfacing with X Windows::
* Modules for Internationalization::
File: internals.info, Node: Low-Level Modules, Next: Basic Lisp Modules, Up: A Summary of the Various XEmacs Modules
Low-Level Modules
=================
size name
------- ---------------------
18150 config.h
This is automatically generated from `config.h.in' based on the
results of configure tests and user-selected optional features and
contains preprocessor definitions specifying the nature of the
environment in which XEmacs is being compiled.
2347 paths.h
This is automatically generated from `paths.h.in' based on supplied
configure values, and allows for non-standard installed configurations
of the XEmacs directories. It's currently broken, though.
47878 emacs.c
20239 signal.c
`emacs.c' contains `main()' and other code that performs the most
basic environment initializations and handles shutting down the XEmacs
process (this includes `kill-emacs', the normal way that XEmacs is
exited; `dump-emacs', which is used during the build process to write
out the XEmacs executable; `run-emacs-from-temacs', which can be used
to start XEmacs directly when temacs has finished loading all the Lisp
code; and emergency code to handle crashes [XEmacs tries to auto-save
all files before it crashes]).
Low-level code that directly interacts with the Unix signal
mechanism, however, is in `signal.c'. Note that this code does not
handle system dependencies in interfacing to signals; that is handled
using the `syssignal.h' header file, described in section J below.
23458 unexaix.c
9893 unexalpha.c
11302 unexapollo.c
16544 unexconvex.c
31967 unexec.c
30959 unexelf.c
35791 unexelfsgi.c
3207 unexencap.c
7276 unexenix.c
20539 unexfreebsd.c
1153 unexfx2800.c
13432 unexhp9k3.c
11049 unexhp9k800.c
9165 unexmips.c
8981 unexnext.c
1673 unexsol2.c
19261 unexsunos4.c
These modules contain code dumping out the XEmacs executable on
various different systems. (This process is highly machine-specific and
requires intimate knowledge of the executable format and the memory map
of the process.) Only one of these modules is actually used; this is
chosen by `configure'.
15715 crt0.c
1484 lastfile.c
1115 pre-crt0.c
These modules are used in conjunction with the dump mechanism. On
some systems, an alternative version of the C startup code (the actual
code that receives control from the operating system when the process is
started, and which calls `main()') is required so that the dumping
process works properly; `crt0.c' provides this.
`pre-crt0.c' and `lastfile.c' should be the very first and very last
file linked, respectively. (Actually, this is not really true.
`lastfile.c' should be after all Emacs modules whose initialized data
should be made constant, and before all other Emacs files and all
libraries. In particular, the allocation modules `gmalloc.c',
`alloca.c', etc. are normally placed past `lastfile.c', and all of the
files that implement Xt widget classes *must* be placed after
`lastfile.c' because they contain various structures that must be
statically initialized and into which Xt writes at various times.)
`pre-crt0.c' and `lastfile.c' contain exported symbols that are used to
determine the start and end of XEmacs' initialized data space when
dumping.
14786 alloca.c
16678 free-hook.c
1692 getpagesize.h
41936 gmalloc.c
25141 malloc.c
3802 mem-limits.h
39011 ralloc.c
3436 vm-limit.c
These handle basic C allocation of memory. `alloca.c' is an
emulation of the stack allocation function `alloca()' on machines that
lack this. (XEmacs makes extensive use of `alloca()' in its code.)
`gmalloc.c' and `malloc.c' are two implementations of the standard C
functions `malloc()', `realloc()' and `free()'. They are often used in
place of the standard system-provided `malloc()' because they usually
provide a much faster implementation, at the expense of additional
memory use. `gmalloc.c' is a newer implementation that is much more
memory-efficient for large allocations than `malloc.c', and should
always be preferred if it works. (At one point, `gmalloc.c' didn't work
on some systems where `malloc.c' worked; but this should be fixed now.)
`ralloc.c' is the "relocating allocator". It provides functions
similar to `malloc()', `realloc()' and `free()' that allocate memory
that can be dynamically relocated in memory. The advantage of this is
that allocated memory can be shuffled around to place all the free
memory at the end of the heap, and the heap can then be shrunk,
releasing the memory back to the operating system. The use of this can
be controlled with the configure option `--rel-alloc'; if enabled,
memory allocated for buffers will be relocatable, so that if a very
large file is visited and the buffer is later killed, the memory can be
released to the operating system. (The disadvantage of this mechanism
is that it can be very slow. On systems with the `mmap()' system call,
the XEmacs version of `ralloc.c' uses this to move memory around
without actually having to block-copy it, which can speed things up;
but it can still cause noticeable performance degradation.)
`free-hook.c' contains some debugging functions for checking for
invalid arguments to `free()'.
`vm-limit.c' contains some functions that warn the user when memory
is getting low. These are callback functions that are called by
`gmalloc.c' and `malloc.c' at appropriate times.
`getpagesize.h' provides a uniform interface for retrieving the size
of a page in virtual memory. `mem-limits.h' provides a uniform
interface for retrieving the total amount of available virtual memory.
Both are similar in spirit to the `sys*.h' files described in section
J, below.
2659 blocktype.c
1410 blocktype.h
7194 dynarr.c
2671 dynarr.h
These implement a couple of basic C data types to facilitate memory
allocation. The `Blocktype' type efficiently manages the allocation of
fixed-size blocks by minimizing the number of times that `malloc()' and
`free()' are called. It allocates memory in large chunks, subdivides
the chunks into blocks of the proper size, and returns the blocks as
requested. When blocks are freed, they are placed onto a linked list,
so they can be efficiently reused. This data type is not much used in
XEmacs currently, because it's a fairly new addition.
The `Dynarr' type implements a "dynamic array", which is similar to
a standard C array but has no fixed limit on the number of elements it
can contain. Dynamic arrays can hold elements of any type, and when
you add a new element, the array automatically resizes itself if it
isn't big enough. Dynarrs are extensively used in the redisplay
mechanism.
2058 inline.c
This module is used in connection with inline functions (available in
some compilers). Often, inline functions need to have a corresponding
non-inline function that does the same thing. This module is where they
reside. It contains no actual code, but defines some special flags that
cause inline functions defined in header files to be rendered as actual
functions. It then includes all header files that contain any inline
function definitions, so that each one gets a real function equivalent.
6489 debug.c
2267 debug.h
These functions provide a system for doing internal consistency
checks during code development. This system is not currently used;
instead the simpler `assert()' macro is used along with the various
checks provided by the `--error-check-*' configuration options.
1643 prefix-args.c
This is actually the source for a small, self-contained program used